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This project was designed to develop techniques for 
adding low-cost speech synthesis to educational software. Four tasks 
wera identified for the study: (1) select a microcomputer with a 
built-in analog-to-digit|^;'converter that is currently being used in 
educational environmeifl|g|r (2) determine the feasibility of 
implementing expansion and playback routines of compressed speech 
files on the targeted microcomputer; (3) test the target 
microcomputer using various bit rates; and (4) determine the 
feasibility of incorporating speech output capability into an 
authoring system currently being developed for speech recognition. 
The Apple Macintosh was selected, and routines were developed in the 
Pascal programming language for expanding and playing speech files at 
various data rates. It was found that the Macintosh is capable of 
producing a range ot synthesized speech and that it is necessary to 
provide the courseware developer with a selection of bit rates. 
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Pinal Report 

Abstract. The major goal of the original Phase I proposal was to develop a low-cost means for 
adding speech output to educational software. The planned vehicle for allowing that was a 
low-cost speech recognition card for microcomputers commonly found In educational 
environments, since funding for the recognition card was terminated (ED did not fund the Phase 
II proposal), it was decided that development of the synthesis capability should proceed on 3 
particular educational machine. We chose the Apple Macintosh computer. This decision was 
made for two reasons: ( 1 ) the Macintosh computer has a built-in A/D converter capable of 
playing high quality speech at no additional cost to the user , and ( 2) Scott instruments is in the 
process of developing an authoring system on the Macintosh computer which could provide an 
easy-to-use method for integrating speech output into educational software. 

It has been shown that the Macintosh computer is capable of producing a range of resynthesized 
speech, from pleasing the most discriminating of ear? (at 32 kbs.), to meeting the needs of those 
more concerned with quantity of speech (22 minutes of speech per disk at -1.8 kbs.). More 
importantly, the stud/ has indicated that it is probably necessary to provide the courseware 
developer with as&lfidiflO of bit rates, all of which should be available even within a particular 
program. Instructions and prompts should be generated at low bit rotes for efficiency and 
articulation models should be generated at high bit rates for accuracy. 
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Anticipated Results: The above findings have led to e Phase II proposal for developing an 
authoring system capable of integrating speech output into courseware developed for the 
Macintosh computer. The proposed system will require additional hardware for the developer to 
enable him to digitize and compress speech files at a variety of different bit rates, but will 
require no additional hardware to run tlie courseware. A model of the authoring system capable 
of being modified to fit the requirements is currently under development and the extent of the 
modifications will be addressed in the Phase II proposal. 
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Final Report. The current Phase I SBIft project was designed to develop technlquer. for adding 
low-cost speech synthesis to educotionol software. The original Phase I proposal was intended to 
fund the modification of a low-cost speech recognition module to support speech synthesis as 
well as speech recognition. The design of the recognition module was supported by ED SBIR grant 
number 300-84-0 1 74 entitled, "Phase 1 - Feature based Recognition for Speaker -Independent 
Voice Control of Microcomputers." A Phase II proposal was submitted as a result of this study in 
March of 1985. The Phase It proposal was rejected by the Oepar Iment of Education, but the 
current Phase I feasibility study was accepted. Since the support for the development of the 
recognition hardware was terminated, an alternative approach toward the development of a 
low-cost speech synthesis capability for educational software was adopted. Instead of developing 
general purpose hardware for recognition and synthesis, it was determined that the most 
cost-effective way to provide low-cost speech synthesis to the educational market was to develop 
a synthesis capability for a microcomputer with a built-in digital-to-analog converter capable 
of producing intelligible speech over a broad range of speech compression rates. 

The effort supported by this Phase I gr ant has resulted in a demonstration of the feasibility of 
producing highly intelligible speech output within a simple-to-use authoring system 
framework which is capable of generating courseware, and which requires no additional 
hardware for student stations. This report will discuss the progress made toward the gcal of 
enatjllng educational courseware developers, as well as non-computer oriented teachers, to 
develop useful classroom courseware rapidly and efficiently with speech output capabilities. 

Idantlflcatlon anri Significance of the Problem 

Speech input and output ma/ be the most powerful adjunct technologies that can be added to 
computer assisted instructional programs. Some of the aJvantages of speech technology in this 
context are obvious, others more subtle. One obvious and well-known example of the power of 
speech teuhnolog/ is Texas Instruments' popular and highly successful Speak and Spell. This 
represents a perfect marriage between technology and education. We believe that there is no 
more effective way to teach spelling than through speech. Other obvious examples of 
appropriate educational applications for speech technology include: the teaching of foreign 
languages, teaching English bs a second language, the training of speech pathologists and 
audlologists, and the rehabilitation of persons with communication disorders. Speech technology 
is capable of involving the student with the instructional machine in a way that cannot 
otherwise be obtained. 



Given that speech technology can play an important and unique role in Computer Assisted 
Instruction (CAl), the problem is one of getting the technology into the hands of the educator. One 
wa/ of accomplishing this is to provide the educttor with complete, ready- to-run , off- the-shelf 
software. This represents the simplest option for the educator. However, the technical 
know-how required to incorporate speeoh technology into educational software, as well as the 
additional hardware costs associated with adding a speech output capability to each student's 
workstation computer begs the following questions: 1. How does an institution Justify the 
purchase of a speech module for classroom use if no educational software exists? 2. Why would 
educational software developers incorporate speech technolog/ into their offering if no one has 
the appropriate hardware for voice output? 3. How do we deliver the educational power of 
speech technology into the hends of the educator in a way that allows for the creative use of the 
technolog/ as a d/namic tool In the educational process? 



The answer is apparently not in the development of off-the-shelf, static lessons which cannct be 
modified by the educator. Courseware purchased over a period of time from many different 
developerswlU obviously lack continuity. The fundino'cycles of educational institutions ensure 
that educators will never be current in their courseware libraries, In fact, the lack of 
courseware can render an expensive computer laboratory useless to all but a few. 

The challenge is, therefore, to develop a means through which any educator may develop 
courseware with a speech input or output capability. Such a s/stem must require minimum 
traininq: The goal is to provide a tool for all educators, not just those with computer skills. 
More important, the system must be cost-effective in its production of educational software. 
Additionally, the system must be designed to take advantage of future enhancements in the student 
workstation without requiring such improvements in the present. In other words, both the 
courseware development system and the student workstation must not be allowed to make each 
other obsolete. 

What has been described is an authoring system capable of handling the integration of speech 
input/output into lessons. The use of the courseware should require as little additional capital 
expense as possible in order for the system to become widely utilized. The present Phas. ' 
feasibility study has been directed at Investigating the Integration of speech technology into an 
authoring system which is simple enough for the inexperienced computer users to use 
effectively. 

Scott Instruments has been developing voice-based authoring systems since 1981. Our first 
system was designated the YBLS® , voice-based learning system. This system utilized a $900 
speech recognition system available for the Apple II series computers. The most successful 
applicatiorof the VBLS system has been in foreign languages training. Systems are currently 
being used daily across the state of Oklahoma for teaching Qermm by satellite, a program 
established by Dr. Harry Wohlert, a professor in the Foreign Languages department at Oklahoma 
State University. The same ys\m was proven effective in the development of rehabilitative 
courseware for speech and hearing disorders during a Phase I study supported by the National 
Institute of Neurological and Communicative Disorders, grant number 1 R43 NS 2M17-0I 
CMS (OD), awarded September of 1984. These studies, and the experiences associated with 
having approximately two hundred VBLS systems in the field, has taught us a great deal about the 
use of speech technology in education. 

The most important items are: ( 1 ) the cost per student station must be as low as possible, (2) 
the system must be simple enough for anyone to use, (3) high resolution graphics is often a 
requirement for many applications, and (4) the system should have speech synthesis as well as 
speech recognition capebilities. 

Phase I - Ooals and occompltshments 

As mentioned above, because the current stud/ was, in part , based on the funding of a companion 
study (which was not funded), the feasibility stud/ was broken into the following tasks: 

( 1 ) select a target microcomputer currently being used In 
educational environments, with a built in analog-to-digital 
convertOT; 

(2) determine the feasibility of implementing expansion and 
playback routines of compressed speech files on the target 
microcomputer; 

€)VBLS Is a registered trademarSc of Scott Instruments Corporation. 



(3) test target microcomputer using var ious bit rates; 
( A) cfetermlne feasibility of incorporating speech output 

capability into an authoring system currently being 

developed for speech recognition 



Target Microcomputftr 

The microcomputer selected was the Apple Macintosh. The Macintosh computer has a built in 
A/D converter , is becoming a popular computer in educational environments, uses the powerful 
Motorola 68000 microprocessor, has high-resolution graphics capabilities, and is a machine 
on which we are currently developing an authoring system which can be adapted to utilize speech 
syjithesis in a simple and effective way. 

Speetih synthesis on th e Macintosh 

Routines were developed in the Pascal programming language for expanding and playing speech 
files previously compressed on a VAX 11/750 development machine located at Scott 
Instruments. The data rate for the compressed files was 16 kilobits (kbs.). Thequality of the 
compressed speech can be heard on the enclosed demonstration tape. 

The 16-kbs. per second data rate represents the simplest of the expansion routines. The total 
amount of speech that can be stored on a Macintosh microfloppy at the 1 6- kbs. data rate is seven 
minutes. However , since we developed non-real time compression and expansion models on our 
development system that operated at date rates as low as 2.4 kbs., the decision was made to 
attempt to push data rates as low as possible on the Macintosh computer. Thus far, assembly 
language routines have been developed and debugged using our own 68000-based development 
system for data rates as low as 4.8 kbs. per seconl 

The following tab'-j shows the data rates for which 68000 assembly code has bt^n developed and 
tested along wlvii the number of minutes of continuous speech that can be stewed on a single 
Macintosh microfloppy disk. 



These assembly language routines can be quickly adapted to the Macintosh computer for ell of the 
above listed speech compression /ates. The enclosed demonstration tape contains examples of the 
speech quality of each data rate. The sentences recorded were taken from the Arimna Articulation 
Prnficien^^ Scale Stmirmttt Tmt The twenty-five sentences from the Arizona test were reoordod 



Number of minutes pRrriisk 



32 kbs 
1 6 kbs 
9.6 kbs 
4.8 kbs 



3.3 min 
7min 
11 min 
22 min 
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at KJch of the four bit rates for a total of 1 00 recordings. Preliminary analysis of the speech 
quality was done Ijy having five speech pathologists from the North Texas State University 
Division of Communication Disorders listen to the tapes and Judge whether the target phones 
designated In the test were adequate as models for use In articulation therapy. The consensus was 
that the 32'-kbs. model was completely adequate, and the 16- and 9.6-kbs, models were 
marginal. The 4.8-kbs. rate was judged Inappropriate for use as speech models for teaching 
proper articulation, but was adequate for verbal Instructions to a student. 

One important findinji in the pilot study was that the persons evaluating the synthesis objected 
more to unnatural srunding transients in the recorded speech than to the speech quality and 
intelligibilty. This indicates the need for developing speech editing tools for the developer that 
would enable him to modify parameters such as amplitude envelope, the endpoints of the desired 
phrase, and possibly even the pitch contour. These editing tools are all possible within the 
constraints of the authoring system currently under development. 

Summary and Conclusions 

The major goal of the original Phase I proposal was to develop a low-cost means for adding 
speech output to educational software. The planned vehicle for allowing that was a low-cost 
speech recognition card for microcomputers commonly found In educational environments. 
Since funding for the recognition card was terminated, the project was altered to develop 
synthesis capability on a particular educational machine, the Apple Macintosh computer. This 
decision was made for two reasons: ( I ) the Macintosh computer has a built-in A/D converter 
capable of playing high quality speech at m additional cost to the user, and (2) Scott 
Instruments Is In the process of developing an authoring system on the Macintosh computer 
which could provide an easy-to-use method for integrating speech output into educational 
software. 

In order to determine whether or not the Macintosh computer could generate high-quality speech 
synthesis In ree'-tlme, expansion routines were developed In assembly language on a 
68000-based development machine at Scott Instruments. Since the required quality of speech 
was unknown , algorithms capable of generating speech compressed at 32 kbs. , 1 6 kbs. , 9.6 kbs. , 
and 4.8 kbs. were all developed bnd tested. It was found \u an Informal test using trained speech 
pathologists as judges, that higher bit rates would bo required for some applications and lower 
bit rates would be adequate for others. 

in conclusion, we have shown that the Macintosh computer Is capaL>le of producing a range of 
resynthesized speech, from pleasing the most discriminating of ears (fit 32 kbs.), to meeting the 
needs of those more concerned with quantity of speech (22 minutes of speech per disk at 4.8 
kbs). More importantly, the study has indicated that It is probably necessary to provide the 
courseware developer with a SBlfidim of bit rates, all of which should be available even within a 
particular program. Instructions and prompts should be generated at low bit rates for 
efficiency and articulation models should be generated at high bit rates for accuracy. 

The above findings have led to a Phase 11 proposal for developing an authoring system capable of 
Integrating speech output Into courseware developed for the Macintosh computer. The proposed 
system will require additional hardware for the - ^eloper to enable him to digitize and compress 
speech files at a variety of different bit rates, but will require no additional hardware to run the 
courseware. A model of the authoring system capable of being modified to fit the requirements is 
currently under development and the extent of the modifications will be addressed In the Phase II 
proposal. 
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